IEICE global.ieice.org Site

Author Search Result

[Author] Aki KOBAYASHI(60hit)

41-60hit(60hit)

Data-Parallel Volume Rendering with Adaptive Volume Subdivision
Kentaro SANO Hiroyuki KITAJIMA Hiroaki KOBAYASHI Tadao NAKAMURA

PAPER-Computer Graphics

Vol:
E83-D No:1
Page(s):
80-89
A data-parallel processing approach is promising for real-time volume rendering because of the massive parallelism in volume rendering. In data-parallel volume rendering, local results processing elements(PEs) generate from allocated subvolumes are integrated to form a final image. Generally, the integration causes an overhead unavoidable in data-parallel volume rendering due to communications among PEs. This paper proposes a data-parallel shear-warp volume rendering algorithm combined with an adaptive volume subdivision method to reduce the communication overhead and improve processing efficiency. We implement the parallel algorithm on a message-passing multiprocessor system for performance evaluation. The experimental results show that the adaptive volume subdivision method can reduce the overhead and achieve higher efficiency compared with a conventional slab subdivision method.
Slot-Array Receiving Antennas Fed by Coplanar Waveguide for 700 GHz Submillimeter-Wave Radiation
Hiroaki KOBAYASHI Yasuhiko ABE Yoshizumi YASUOKA

PAPER-Phased Arrays and Antennas

Vol:
E82-C No:7
Page(s):
1248-1252
Thin-film slot-array receiving antennas fed by coplanar waveguide (CPW) were fabricated on fused quartz substrates, and the antenna properties were investigated at 700 GHz. It was confirmed that the transmission efficiency of CPW was 0.83/λm, and the rate of radiated power from a slot antenna was 0.5 at 700 GHz. The fabricated antennas worked as expected from the theory based on the transmission line model, and the two-dimensional 83 slot-array antenna fed by CPW increased the power gain by 11 dB over a single-slot antenna. The power gain of the antenna was 13 dBi and the aperture efficiency was 40% when the 700 GHz-submillimeter wave was irradiated through the substrate.
Acceleration Techniques for the Network Inversion Algorithm
Hiroyuki TAKIZAWA Taira NAKAJIMA Masaaki NISHI Hiroaki KOBAYASHI Tadao NAKAMURA

LETTER-Bio-Cybernetics and Neurocomputing

Vol:
E82-D No:2
Page(s):
508-511
We apply two acceleration techniques for the backpropagation algorithm to an iterative gradient descent algorithm called the network inversion algorithm. Experimental results show that these techniques are also quite effective to decrease the number of iterations required for the detection of input vectors on the classification boundary of a multilayer perceptron.
A New Linear Prediction Filter Based Adaptive Algorithm For IIR ADF Using Allpass and Minimum Phase System
James OKELLO Yoshio ITOH Yutaka FUKUI Masaki KOBAYASHI

PAPER-Digital Signal Processing

Vol:
E81-A No:1
Page(s):
123-130
An adaptive infinite impulse response (IIR) filter implemented using an allpass and a minimum phase system has an advantage of its poles converging to the poles of the unknown system when the input is a white signal. However, when the input signal is colored, convergence speed deteriorates considerably, even to the point of lack of convergence for certain colored signals. Furthermore with a colored input signal, there is no guarantee that the poles of the adaptive digital filter (ADF) will converge to the poles of the unknown system. In this paper we propose a method which uses a linear predictor filter to whiten the input signal so as to improve the convergence characteristic. Computer simulation results confirm the increase in convergence speed and the convergence of the poles of the ADF to the poles of the unknown system even when the input is a colored signal.
A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts
Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Computer System

Vol:
E96-D No:9
Page(s):
2047-2054
Chip multiprocessors (CMPs) improve performance by simultaneously executing multiple threads using integrated multiple cores. However, since these cores commonly share one cache, inter-thread cache conflicts often limit the performance improvement by multi-threading. This paper focuses on two causes of inter-thread cache conflicts. In shared caches of CMPs, cached data fetched by one thread are frequently evicted by another thread. Such an eviction, called inter-thread kickout (ITKO), is one of the major causes of inter-thread cache conflicts. The other cause is capacity shortage that occurs when one cache is shared by threads demanding large cache capacities. If the total capacity demanded by the threads exceeds the actual cache capacity, the threads compete to use the limited cache capacity, resulting in capacity shortage. To address inter-thread cache conflicts, we must take into account both ITKOs and capacity shortage. Therefore, this paper proposes a capacity-aware thread scheduling method combined with cache partitioning. In the proposed method, inter-thread cache conflicts due to ITKOs and capacity shortage are decreased by cache partitioning and thread scheduling, respectively. The proposed scheduling method estimates the capacity demand of each thread with an estimation method used in the cache partitioning mechanism. Based on the estimation used for cache partitioning, the thread scheduler decides thread combinations sharing one cache so as to avoid capacity shortage. Evaluation results suggest that the proposed method can improve overall performance by up to 8.1%, and the performance of individual threads by up to 12%. The results also show that both cache partitioning and thread scheduling are indispensable to avoid both ITKOs and capacity shortage simultaneously. Accordingly, the proposed method can significantly reduce the inter-thread cache conflicts and hence improve performance.
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering
Ken-ichi SUZUKI Yoshiyuki KAERIYAMA Kazuhiko KOMATSU Ryusuke EGAWA Nobuyuki OHBA Hiroaki KOBAYASHI

PAPER-Computer Graphics

Vol:
E93-D No:4
Page(s):
891-902
Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive dynamic scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.
A Multiple Block-matching Step (MBS) Algorithm for H.26x/MPEG4 Motion Estimation and a Low-Power CMOS Absolute Differential Accumulator Circuit
Tadayoshi ENOMOTO Nobuaki KOBAYASHI Tomomi EI

PAPER-Digital

Vol:
E90-C No:4
Page(s):
718-726
To drastically reduce the power dissipation (P) of an absolute difference accumulation (ADA) circuit for H.26x/MPEG4 motion estimation, a fast block-matching (BM) algorithm called the Multiple Block-matching Step (MBS) algorithm has been developed. The MBS algorithm can drastically improve the block matching speed, while achieving the same visual quality as that of a full search (FS) BM algorithm. Power dissipation (P) of a 0.18-µm CMOS absolute difference accumulator (ADA) circuit employing the MBS algorithm is significantly reduced to the range of about 0.3% to 12% that of the same ADA circuit adopting FS.
Dynamic Activating and Deactivating Loss Recovery Router for Live Streaming Multicast
Yuthapong SOMCHIT Aki KOBAYASHI Katsunori YAMAOKA Yoshinori SAKAI

PAPER-Network

Vol:
E89-B No:5
Page(s):
1534-1544
Live streaming is delay sensitive and can tolerate some amount of loss. The QoS Multicast for Live Streaming (QMLS) Protocol, focuses on the characteristics of live streaming. It has been shown to improve the performance of live streaming multicast by reducing the end-to-end packet loss probability. However, the placement of active routers performing the QMLS function has not been discussed. This paper proposes a dynamic method to activate and deactivate routers in order to minimize the number of active routers for each QMLS-packet flow and discusses its parameters. The results of an evaluation show that the proposed method can reduce the number of active routers for each flow and adjust the active routers according to changes in the multicast tree.
Low Dynamic Power and Low Leakage Power Techniques for CMOS Motion Estimation Circuits
Nobuaki KOBAYASHI Tomomi EI Tadayoshi ENOMOTO

PAPER-Low Power Techniques

Vol:
E89-C No:3
Page(s):
271-279
To drastically reduce the dynamic power (PAT) and the leakage power (PST) of the CMOS MPEG4/H.264 motion estimation (ME) circuits, several power reduction techniques were developed. They were circuit architectures, which were able to reduce the supply voltages (VDD) and numbers of logic gates of not only the whole circuit but the critical path, a fast motion estimation algorithm, and a leakage current reduction circuit. A 0.18-µm CMOS ME circuit has been fabricated by adopting those techniques. At a clock frequency of 160 MHz and VDD of 1.25 V, PAT decreased to 75.9 µW, which was 5.35% that of a conventional ME circuit. PST also decreased to 0.82 nW, which was 3.93% that of the conventional ME circuit.
A Low Power Multimedia Processor Implementing Dynamic Voltage and Frequency Scaling Technique and Fast Motion Estimation Algorithm Called “Adaptively Assigned Breaking-Off Condition (A²BC)”
Tadayoshi ENOMOTO Nobuaki KOBAYASHI

PAPER

Vol:
E96-C No:4
Page(s):
424-432
A motion estimation (ME) multimedia processor was developed by employing dynamic voltage and frequency scaling (DVFS) technique to greatly reduce the power dissipation. To make full use of the advantages of DVFS technique, a fast motion estimation (ME) algorithm was also developed. It can adaptively predict the optimum supply voltage and the optimum clock frequency before ME process starts for each macro-block for encoding. Power dissipation of the 90-nm CMOS DVFS controlled multimedia processor, which contained an absolute difference accumulator as well as a small on-chip DC/DC level converter, a minimum value detector and DVFS controller, was reduced to 38.48 µW, which was only 3.261% that of a conventional multimedia processor.
Complex-Valued Bipartite Auto-Associative Memory
Yozo SUZUKI Masaki KOBAYASHI

PAPER-Nonlinear Problems

Vol:
E97-A No:8
Page(s):
1680-1687
Complex-valued Hopfield associative memory (CHAM) is one of the most promising neural network models to deal with multilevel information. CHAM has an inherent property of rotational invariance. Rotational invariance is a factor that reduces a network's robustness to noise, which is a critical problem. Here, we proposed complex-valued bipartite auto-associative memory (CBAAM) to solve this reduction in noise robustness. CBAAM consists of two layers, a visible complex-valued layer and an invisible real-valued layer. The invisible real-valued layer prevents rotational invariance and the resulting reduction in noise robustness. In addition, CBAAM has high parallelism, unlike CHAM. By computer simulations, we show that CBAAM is superior to CHAM in noise robustness. The noise robustness of CHAM decreased as the resolution factor increased. On the other hand, CBAAM provided high noise robustness independent of the resolution factor.
Uniqueness Theorem of Complex-Valued Neural Networks with Polar-Represented Activation Function
Masaki KOBAYASHI

PAPER-Nonlinear Problems

Vol:
E98-A No:9
Page(s):
1937-1943
Several models of feed-forward complex-valued neural networks have been proposed, and those with split and polar-represented activation functions have been mainly studied. Neural networks with split activation functions are relatively easy to analyze, but complex-valued neural networks with polar-represented functions have many applications but are difficult to analyze. In previous research, Nitta proved the uniqueness theorem of complex-valued neural networks with split activation functions. Subsequently, he studied their critical points, which caused plateaus and local minima in their learning processes. Thus, the uniqueness theorem is closely related to the learning process. In the present work, we first define three types of reducibility for feed-forward complex-valued neural networks with polar-represented activation functions and prove that we can easily transform reducible complex-valued neural networks into irreducible ones. We then prove the uniqueness theorem of complex-valued neural networks with polar-represented activation functions.
Hybrid Quaternionic Hopfield Neural Network
Masaki KOBAYASHI

PAPER-Nonlinear Problems

Vol:
E98-A No:7
Page(s):
1512-1518
In recent years, applications of complex-valued neural networks have become wide spread. Quaternions are an extension of complex numbers, and neural networks with quaternions have been proposed. Because quaternion algebra is non-commutative algebra, we can consider two orders of multiplication to calculate weighted input. However, both orders provide almost the same performance. We propose hybrid quaternionic Hopfield neural networks, which have both orders of multiplication. Using computer simulations, we show that these networks outperformed conventional quaternionic Hopfield neural networks in noise tolerance. We discuss why hybrid quaternionic Hopfield neural networks improve noise tolerance from the standpoint of rotational invariance.
FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms
Masayuki SATO Ryusuke EGAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER

Vol:
E98-C No:7
Page(s):
550-558
As energy consumption of cache memories increases, an energy-efficient cache management mechanism is required. While a dynamic cache resizing mechanism is one promising approach to the energy reduction of microprocessors, one problem is that its effect is limited by the existence of dead-on-fill blocks, which are not used until their evictions from the cache memory. To solve this problem, this paper proposes a cache management policy named FLEXII, which can reduce the number of dead-on-fill blocks and help dynamic cache resizing mechanisms further reduce the energy consumption of the cache memories.
An Adaptive Algorithm for Cascaded Notch Filter with Reduced Bias
James OKELLO Shin'ichi ARITA Yoshio ITOH Yutaka FUKUI Masaki KOBAYASHI

PAPER-Digital Signal Processing

Vol:
E84-A No:2
Page(s):
589-596
In this paper we propose a new simplified algorithm for cascaded second order adaptive notch filters implemented using an allpass filter, for elimination of multiple sinusoids. Each of the stages of the notch filter is implemented using direct form second order allpass filter. We also present an analysis which compares the proposed algorithm with the conventional simplified algorithm, and which indicates that the proposed algorithm has a reduced bias in the estimation of the multiple input sinusoids. Simulation results that have been provided confirm this analysis.
A Metadata Prefetching Mechanism for Hybrid Memory Architectures Open Access
Shunsuke TSUKADA Hikaru TAKAYASHIKI Masayuki SATO Kazuhiko KOMATSU Hiroaki KOBAYASHI

PAPER

Pubricized:
2021/12/03
Vol:
E105-C No:6
Page(s):
232-243
A hybrid memory architecture (HMA) that consists of some distinct memory devices is expected to achieve a good balance between high performance and large capacity. Unlike conventional memory architectures, the HMA needs the metadata for data management since the data are migrated between the memory devices during the execution of an application. The memory controller caches the metadata to avoid accessing the memory devices for the metadata reference. However, as the amount of the metadata increases in proportion to the size of the HMA, the memory controller needs to handle a large amount of metadata. As a result, the memory controller cannot cache all the metadata and increases the number of metadata references. This results in an increase in the access latency to reach the target data and degrades the performance. To solve this problem, this paper proposes a metadata prefetching mechanism for HMAs. The proposed mechanism loads the metadata needed in the near future by prefetching. Moreover, to increase the effect of the metadata prefetching, the proposed mechanism predicts the metadata used in the near future based on an address difference that is the difference between two consecutive access addresses. The evaluation results show that the proposed metadata prefetching mechanism can improve the instructions per cycle by up to 44% and 9% on average.
A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning
Shoichi HIRASAWA Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-Software

Pubricized:
2015/09/15
Vol:
E98-D No:12
Page(s):
2178-2186
Automatic performance tuning of a practical application could be time-consuming and sometimes infeasible, because it often needs to evaluate the performances of a large number of code variants to find the best one. In this paper, hence, a light-weight rollback mechanism is proposed to evaluate each of code variants at a low cost. In the proposed mechanism, once one code variant of a target code block is executed, the execution state is rolled back to the previous state of not yet executing the block so as to repeatedly execute only the block to find the best code variant. It also has a feature of terminating a code variant whose execution time is longer than the shortest execution time so far. As a result, it can prevent executing the whole application many times and thus reduces the timing overhead of an auto-tuning process required for finding the best code variant.
Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems
Muhammad ALFIAN AMRIZAL Atsuya UNO Yukinori SATO Hiroyuki TAKIZAWA Hiroaki KOBAYASHI

PAPER-High performance computing

Pubricized:
2017/07/14
Vol:
E100-D No:12
Page(s):
2749-2760
Coordinated checkpointing is a widely-used checkpoint/restart protocol for fault-tolerance in large-scale HPC systems. However, this protocol will involve massive amounts of I/O concentration, resulting in considerably high checkpoint overhead and high energy consumption. This paper focuses on speculative checkpointing, a CPR mechanism that allows for temporal distribution of checkpointings to avoid I/O concentration. We propose execution time and energy models for speculative checkpointing, and investigate energy-performance characteristics when speculative checkpointing is adopted in exascale systems. Using these models, we study the benefit of speculative checkpointing over coordinated checkpointing under various realistic scenarios for exascale HPC systems. We show that, compared to coordinated checkpointing, speculative checkpointing can achieve up to a 11% energy reduction at the cost of a relatively-small increase in the execution time. In addition, a significant energy-performance trade-off is expected when the system scale exceeds 1.2 million nodes.
Simplification of Liquid Dielectric Property Evaluation Based on Comparison with Reference Materials and Electromagnetic Analysis Using the Cut-Off Waveguide Reflection Method
Kouji SHIBATA Masaki KOBAYASHI

PAPER

Vol:
E100-C No:10
Page(s):
908-917
In this study, expressions were compared with reference material using the coaxial feed-type open-ended cut-off circular waveguide reflection method to support simple and instantaneous evaluation of dielectric constants in small amounts of scarce liquids over a broad frequency range. S11 values were determined via electromagnetic analysis for individual jig structure conditions and dielectric property values without actual S11 measurement under the condition that the tip of the measurement jig with open and short-ended conditions and with the test material inserted. Next, information on the relationships linking jig structure, dielectric properties and S11 properties was stored on a database to simplify the procedure and improve accuracy in reference material evaluation. The accuracy of the estimation formula was first theoretically verified for cases in which values indicating the dielectric properties of the reference material and the actual material differed significantly to verify the effectiveness of the proposed method. The results indicated that dielectric property values for various liquids measured at 0.5 and 1.0GHz using the proposed method corresponded closely to those obtained using the method previously proposed by the authors. The effectiveness of the proposed method was evaluated by determining the dielectric properties of certain liquids at octave-range continuous frequencies between 0.5 and 1.0GHz based on interpolation from limited data of several frequencies. The results indicated that the approach enables quicker and easier measurement to establish the complex permittivity of liquids over a broad frequency range than the previous method.
Quantized Decoder Adaptively Predicting both Optimum Clock Frequency and Optimum Supply Voltage for a Dynamic Voltage and Frequency Scaling Controlled Multimedia Processor
Nobuaki KOBAYASHI Tadayoshi ENOMOTO

PAPER-Electronic Circuits

Vol:
E101-C No:8
Page(s):
671-679
To completely utilize the advantages of dynamic voltage and frequency scaling (DVFS) techniques, a quantized decoder (QNT-D) was developed. The QNT-D generates a quantized signal processing quantity (Q) using a predicted signal processing quantity (M). Q is used to produce the optimum frequency (opt.fc) and the optimum supply voltage (opt.VD) that are proportional to Q. To develop a DVFS controlled motion estimation (ME) processor, we used both the QNT-D and a fast ME algorithm called A2BC (Adaptively Assigned Breaking-off Condition) to predict M for each macro-block (MB). A DVFS controlled ME processor was fabricated using 90-nm CMOS technology. The total power dissipation (PT) of the processor was significantly reduced and varied from 38.65 to 99.5 µW, only 3.27 to 8.41 % of PT of a conventional ME processor, depending on the test video picture.

41-60hit(60hit)

Author Search Result

[Author] Aki KOBAYASHI(60hit)

Data-Parallel Volume Rendering with Adaptive Volume Subdivision

Slot-Array Receiving Antennas Fed by Coplanar Waveguide for 700 GHz Submillimeter-Wave Radiation

Acceleration Techniques for the Network Inversion Algorithm

A New Linear Prediction Filter Based Adaptive Algorithm For IIR ADF Using Allpass and Minimum Phase System

A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts

A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering

A Multiple Block-matching Step (MBS) Algorithm for H.26x/MPEG4 Motion Estimation and a Low-Power CMOS Absolute Differential Accumulator Circuit

Dynamic Activating and Deactivating Loss Recovery Router for Live Streaming Multicast

Low Dynamic Power and Low Leakage Power Techniques for CMOS Motion Estimation Circuits

A Low Power Multimedia Processor Implementing Dynamic Voltage and Frequency Scaling Technique and Fast Motion Estimation Algorithm Called “Adaptively Assigned Breaking-Off Condition (A²BC)”

Complex-Valued Bipartite Auto-Associative Memory

Uniqueness Theorem of Complex-Valued Neural Networks with Polar-Represented Activation Function

Hybrid Quaternionic Hopfield Neural Network

FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms

An Adaptive Algorithm for Cascaded Notch Filter with Reduced Bias

A Metadata Prefetching Mechanism for Hybrid Memory Architectures Open Access

A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning

Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems

Simplification of Liquid Dielectric Property Evaluation Based on Comparison with Reference Materials and Electromagnetic Analysis Using the Cut-Off Waveguide Reflection Method

Quantized Decoder Adaptively Predicting both Optimum Clock Frequency and Optimum Supply Voltage for a Dynamic Voltage and Frequency Scaling Controlled Multimedia Processor

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles